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We study how urban quality evolves as a result of carbon dioxide emissions as urban agglomerations grow. 
We employ a bottom-up approach combining two unprecedented microscopic data on population and 
carbon dioxide emissions in the continental US. We first aggregate settlements that are close to each other 
into cities using the City Clustering Algorithm (CCA) defining cities beyond the administrative boundaries. 
Then, we use data on C0 2 emissions at a fine geographic scale to determine the total emissions of each city. 
We find a superlinear scaling behavior, expressed by a power-law, between C0 2 emissions and city 
population with average allometric exponent p = 1.46 across all cities in the US. This result suggests that the 
high productivity of large cities is done at the expense of a proportionally larger amount of emissions 
compared to small cities. Furthermore, our results are substantially different from those obtained by the 
standard administrative definition of cities, i.e. Metropolitan Statistical Area (MSA). Specifically, MS As 
display isometric scaling emissions and we argue that this discrepancy is due to the overestimation of MSA 
areas. The results suggest that allometric studies based on administrative boundaries to define cities may 
suffer from endogeneity bias. 

Allometry was originally introduced in the context of evolutionary theory 1 to describe the correlation 
between relative dimensions of parts of body size, for instance brain size in mammals, with changes in 
overall body size. In a classical result, Kleiber showed that surface area, Y, and the body mass, X, of a large 
range of mammal's are related by an allometric power-law Y = AX^, where = 3/4 is the allometric exponent and 
A is a constant 2 . 

In analogy with biological systems, Bettencourt et aV showed that cities across US obey allometric relations 
with population size. Indeed, a large class of human activities can be grouped into three categories according to the 
value of the allometric exponent: (a) Isometric behavior (linear, non- allometric or extensive, = 1) typically 
reflects the scaling with population size of individual human needs, like the number of jobs, houses, and water 
consumption, (b) Allometric sublinear behavior (hipoallometric, non -extensive, ft < 1) implies an economy of 
scale in the quantity of interest because its per capita measurement decreases with population size. Hipoallometry 
is found, for example, in the number of gasoline stations, length of electrical cables, and road surfaces (material 
and infrastructure), (c) Superlinear behavior (hyperallometric, non-extensive, > 1) emerges whenever the 
pattern of social activity has significant influence in the urban indicator. Wages, income, growth domestic 
product, bank deposits, as well as rates of invention measured by patents and employment in creative sectors, 
display a superlinear increase with population size. These superlinear scaling laws indicate that larger cities are 
associated with optimal levels of human productivity and quality of life; doubling the city size leads to a larger- 
than-double increment in productivity and life standards 3 " 5 . 

The optimal productivity of large cities raises the question of the consequences of urban growth to envir- 
onmental quality. Indeed, it is intensely debated whether large cities can be considered environmentally "green", 
implying that their productivity is associated with lower than expected greenhouse gases (GHG) and pollutant 
emissions 6 " 11 . For instance, some of these studies report that the level of commuting has a major contributing to 
the relation between GHG emissions and city size 6 " 811 . As a consequence, compact cities would be more green due 
to the attenuation of the average commuting length. More recently, however, Gaigne et al. 12 suggested that 
compact cities might not be as environmentally friendly as it was thought, mainly because increasing- density 
policies obligate firms and households to change place. This relocation of the urban system then generates a 
higher level of pollution. In this context, here we study the allometric laws associated with a particular type of 
GHG emissions from human activity by studying the relation between C0 2 emissions of cities as a function of 
population size. We employ a bottom-up approach combining two unprecedented microscopic data on popu- 
lation and carbon dioxide emissions in the continental US. We first define the boundaries of cities using the City 
Clustering Algorithm (CCA) 13 " 21 which are then used to calculate the C0 2 emissions. We find a superlinear 
allometric scaling law between emissions and city size. We also explore different sectors and activities of the 
economy finding superlinear behavior in most of the sectors. Our results pertain only emissions of C0 2 . It will be 
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desirable to extend it to the rest of GHGs. These results indicate that 
large cities may not provide as many environmental advantages as 
previously thought 7,9 " 11 . 

Results 

Datasets. We use two geo- referenced datasets on population and 
C0 2 emissions in the continental US defined in a fine geometrical 
grid. The population dataset is obtained from the Global Rural- 
Urban Mapping Project (GRUMPvl) 22 . These data are a 
combination of gridded census and satellite data for population of 
urban and rural areas in the United States in year 2000 (Fig. la and 
Sec. 3). The GRUMPvl data provides a high resolution gridded 
population data at 30 arc-second, equivalent to a grid of 0.926 km 
X 0.926 km at the Equator line. 

The emissions dataset is obtained from the Vulcan Project (VP) 
compiled at Arizona State University 23 . The VP provides fossil fuel 
CO 2 emissions in the continental US at a spatial resolution of 10 km 
X 10 km (0.1 deg X 0.1 deg grid) from 1999 to 2008. The data are 
separated according to economic sectors and activities (see Sec. 3 for 
details): Commercial, Industrial, and Residential sectors (obtained 
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Figure 1 | Population and emissions in US. (a) The population map of the 
contiguous US from the Global Rural-Urban Mapping Project 
(GRUMPvl) 22 dataset in logarithmic scale, (b) The C0 2 emissions map of 
the contiguous US from the Vulcan Project (VP) dataset 23 measured in log 
base 10 scale of metric tonnes of carbon per year, (c) Map of the mean 
household income per capita of 3, 092 US counties in dollars from US 
Census Bureau dataset 24 for the year 2000. 



from country-level aggregation of non-geocoded sources and non- 
electricity producing sources from geocoded location), Electricity 
Production (geolocated sources associated with the production of 
electricity such as thermal power stations), Onroad Vehicles (mobile 
transport using designated roadways such as automobiles, buses, and 
motorcycles), Nonroad Vehicles (mobile surface sources that do not 
travel on roadways such as boats, trains, snowmobiles), Aircraft 
(Airports, geolocated sources associated with taxi, takeoff, and land- 
ing cycles associated with air travel, and Aircraft, gridded sources 
associated with the airborne component of air travel), and Cement 
Industry. 

We analyze the annual average of emissions in 2002 for the total of 
all sectors combined (see Fig. lb) and each sector separately (Fig. 2). 
The choice of 2002 data (rather than 2000 as in population) reflects 
the constraint that it is the only year for which the quantification of 
C0 2 emissions has been achieved at the scale of individual factories, 
powerplants, roadways and neighborhoods and on an hourly basis 23 . 

To define the boundary of cities, we use the notion of spatial 
continuity by aggregating settlements that are close to each other 
into cities 15 " 18,20,21 . Such a procedure, called the City Clustering 
Algorithm (CCA), considers cities as constituted of contiguous com- 
mercial and residential areas for which we know also the emissions of 
C0 2 from the Vulcan Project dataset. By using two microscopically 
defined datasets, we are able to match precisely the population of 
each agglomeration to its rate of C0 2 emissions by constructing the 
urban agglomerations from the bottom up without resorting to pre- 
defined administrative boundaries. 

We also use the US income dataset available in ASCII format by 
US Census Bureau 24 for the year 2000. This dataset provides the 
mean household income per capita for the 3, 092 US counties. For 
each county, we combined the income data and the administrative 
boundaries 25 in order to relate them with the geolocated datasets 
(Fig. lc and Sec. 3). 

We first apply the CCA to construct cities aggregating population 
sites D t at site i. The procedure depends on a population threshold D* 
and a distance threshold €. If D f > D*, the site i is populated. The 
length € represents a cutoff distance between the sites to consider 
them as spatially contiguous, i.e. we aggregate all nearest-neighbor 
sites which are at distances smaller than t. Thus a CCA cluster or city 
is defined by populated sites within a distance smaller than € as seen 
schematically in Fig. 3. Starting from an arbitrary seed, we add all 
populated neighbors at distances to the cluster smaller than € until no 
more sites can be added to the cluster. The scaling laws produced by 
the CCA depend weakly on D* and t. and we are interested in a 
region of the parameters where the scaling laws are independent of 
these parameters. 

This aggregation criterion based on the geographical continuity of 
development was shown to provide strong evidence of Zipf s law in 
the US and TJK 1 5-1 8,20,21 in agreement with established results in urban 
sciences 26 " 29 . For cut-off lengths above € = 5 km, it was shown that 
CCA clusters verify the Zipf s law and the Zipf s exponent is inde- 
pendent of t. Next, we first present results for aggregated clusters at 
€ = 5 km, and then show the robustness of the scaling laws over a 
larger range of parameter space. 

In order to assign the total C0 2 emissions to a given CCA cluster, 
we superimpose the obtained cluster to the C0 2 emissions dataset. If 
a populated site composing a CCA cluster falls inside a C0 2 site, we 
assign to the populated site the corresponding C0 2 emissions pro- 
portional to its area 0.926 2 km 2 , considering that the emissions den- 
sity is constant across the C0 2 site of 10 2 km 2 . For a given CCA 
cluster, we then calculate the population (POP) and C0 2 emissions 
by adding the values of the constitutive sites of the cluster. 

Scaling of emissions with city size. Figure 4 shows the correlation 
between the total annual C0 2 emissions and POP for each CCA 
cluster for € = 5 km and D* = 1000 (N = 2281). We perform a 
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Figure 2 | The C0 2 emissions maps in metric tonnes of carbon per year from Vulcan Project (VP) dataset 23 for each sector: (a) Aircraft, (b) Cement, (c) 
Commercial, (d) Industrial, (e) On-road, (f) Non-road, (g) Residential and (h) Electricity. 



non-parametric regression with bootstrapped 95% confidence 
bands 30,31 (see Sec. 3). We find that the emissions grow with the 
size of the cities, on average, faster than the expected linear 
behavior. The result can be approximated over many orders of 
magnitudes by a power-law yielding the following allometric 
scaling law: 

log(C0 2 )=A + £log(POP), (1) 

where A = 2.05 ± 0.12 and )B = 1.38 ± 0.03 (R 2 = 0.76) is the 
allometric scaling exponent obtained from Ordinary Least Squares 
(OLS) analysis 32 for this particular set of parameters € = 5 km and 
D* = 1000 (see Sec. 3 for details on OLS and on the estimation of the 
exponent error, all emissions are measured in log base 10 of metric 
tonnes of carbon per year). 

In addition, we investigate the robustness of the allometric expo- 
nent as a function of the thresholds D* and t. Figure 5a shows /? as a 
function of the cut-off length € for different values of population 
threshold D* (1000, 2000, 3000, and 4000). We observe that ft 



increases with € until a saturation value which is relatively independ- 
ent of D*. Performing an average of the exponent in the plateau 
region with € > 10 km over D*, we obtain /?= 1.46 + 0.02. Thus, 
we find superlinear allometry indicating an inefficient emissions law 
for cities: doubling the city population results in an average incre- 
ment of 146% in C0 2 emissions, rather than the expected isometric 
100%. This positive non-extensivity suggests that the high produc- 
tivity found in larger cities 3 ' 4 is done at the expense of a dispropor- 
tionally larger amount of emissions compared to small cities. 

Figure 5b investigates the emissions of cities as deconstructed by 
different sectors and activities of the economy. We perform non- 
parametric regression with bootstrapped 95% confidence bands of 
P (see Fig. 6 for D* = 1000 and € = 5 km by each sector) versus € and 
we find that the exponents for different sectors saturate to an approx- 
imate constant value for € > 10 km. We assign an average exponent, 
P over the plateau per sector as seen in Table I. The sectors with 
higher exponents (less efficient) are Residential, Industrial, 
Commercial and Electric Production with 1.47— 1.62, above 
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Figure 3 | CCA stages: We consider that if D { > D*> then the site i is 
populated (light blue squares). Each site is defined by its geometric center 
(black circles) and the length € represents a cutoff on the distance to define 
the nearest neighbor sites. We aggregate all nearest-neighbor sites, i.e. a 
CCA is defined by populated sites within a distance smaller than i (red 
circles). 

the average for the total emissions. Onroad vehicles contribute with a 
superlinear exponent /? = 1.42 + 0.03, yet, below the total average. 
The exponent for Nonroad vehicles is also below the average at 
/? = 1 .23 + 0. 05, while Aircraft sector displays approximate isometric 
scaling with /? = 1.05 + 0.01. Cement Production displays sublinear 
scaling /? = 0.21 + 0.03, although the reported data is less significant 
than the rest with only 20 datapoints of cities available. 

We further investigate the dependence of the allometric exponent 
P on the income per capita of cities by aggregating the CCA clusters 
by their income (INC) and plotting the obtained jS(INC) in Fig. 7 
(see also Fig. 8). We find an inverted U-shape relationship, which is 
analogous to the so-called environmental Kuznets curve (EKC) 7 ' 33 ' 34 . 
We observe that ft initially increases for cities with low income per 
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Figure 4 | Scaling of C0 2 emissions versus population. We found a 
superlinear relation between C0 2 (metric tonnes/year) and POP with the 
allometric scaling exponent ft = 1.38 ± 0.03 (R 2 = 0.76) for the case t = 
5 km, D* = 1000. The solid (black) line is the Nadaraya- Watson estimator, 
the dashed (black) lines are the lower and upper confidence interval, and 
the solid (red) line is the linear regression. 

capita until an income turning point located at $ 37, 235 per capita (in 
2000 US dollar). After the turning point, ft decreases indicating an 
environmental improvement for large-income cities. However, the 
allometric exponent remains always larger than one regardless of the 
income level (except for the lowest income) indicating that almost all 
large cities are less efficient than small ones, no matter their income. 

Comparison with MSA. A further important issue in the scaling of 
cities is the dependence on the way they are defined 15 " 18 ' 20,21 ' 35 . Thus, it 
is of interest to compare our results with definitions based on 
administrative boundaries such as the commonly used Metropo- 
litan Statistical Areas (MSA) 36 provided by the US Census 
Bureau 37 . MSAs are constructed from administrative boundaries 
aggregating neighboring counties which are related socioecono- 
mically via, for instance, large commuting patterns. A drawback is 
that MSAs are available only for a subset (274 cites) of the most 
populated cities in the US, and therefore can represent only the 
upper tail of the distribution 17 ' 21 ' 35 (see Sec. 3 for details). 

Furthermore, we find that the MSA construction violates the 
expected extensivity 317 between the land area occupied by the MSA 
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Figure 5 | Behavior of allometric exponent p. (a) We plot ft for the total emissions for different D* as a function of i. The exponent P increases with € 
until a saturation value, (b) Allometric exponent versus € for the different sectors of the economy as indicated. The scaling exponent ranges from 
sublinear behavior (ft < 1, optimal) on the cement and aircraft sectors, to superlinear behavior (ft > 1, suboptimal) on nonroad and onroad vehicles, and 
residential emissions, up to the less efficient sectors in commercial, industrial and electricity production activities. 
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Figure 6 | The plot shows the C0 2 behavior measured in metric tonnes of carbon per year versus POP of the CCA clusters for different sectors. We 

found a superlinear relation between C0 2 and POP for all the cases, except to Aircraft and Cement sectors. The solid (black) line is the Nadaraya- Watson 
estimator, the dashed (black) lines are the lower and upper confidence interval, and the solid (red) line is the linear regression. 
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Table 1 Allometric exponents for CO2 emissions according to different sectors and total emissions of 


all sectors 
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1000 indicated by 1 


and the averaged value /? over I 


> 1 0 km and 1 000 < D* < 4000. The number of 
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and their population since MSA overestimates the area of the small 
agglomerations 17 . This is indicated in Fig. 9, where we find the regres- 
sion: 

log(AREA M sA) = a M SA + b M SA log(POP M sA), (2) 

with a MS A = 0.81 ± 0.36 and b MS A = 0.51 ± 0.06 (R 2 = 0.48). This 
approximate square-root law implies that the density is not constant 
across the MSAs: 

Pmsa~POP 1/2 . (3) 

On the contrary, CCA clusters capture precisely the occupied area of 
the agglomeration leading to the expected extensive relation between 
land area and population as seen also in Fig. 9: 

log(AREAccA) = a CC A + b CC A log(POP CC A), (4) 

with a CCA = -2.86 ± 0.06 and b CCA = 0.94 ± 0.01, with small 
dispersion R 2 = 0.99, implying that the density of population of 
CCA clusters is well-defined (extensive), i.e. it is constant across 
population sizes, 

Pcca~ const. (5) 

In summary, while the CCA displays almost isometric relation 
between population and area, the MSA shows a sublinear scaling 
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Figure 7 | Dependence of allometric exponent /? on the income per 
capita of the CCA clusters. We found an inverted-U-shaped curve 
similar to an environmental Kuznets curve (EKC). In other words, we 
find a decrease of the allometric exponent [5 for the lower and higher 
income levels, with the following regression coefficients a 0 = —247.35, 
a x = 108.88 and a 2 = —11.91. The income turning point is located at 
10" ai/(2a2) = i7S$ 37,235. 



between these two measures. As a consequence, the emission of 
CCA is independent of the population density, as expected. On the 
other hand, from Eq. 2 and Eq. 6, the MSA leads to a superlinear 
scaling between them, C0 2 ~ Pm&a- 

The non-extensive character of the MSA areas is due to the fact 
that many MSAs are constituted by aggregating small disconnected 
clusters resulting in large unpopulated areas inside the MSA. This is 
exemplified in some typical MSAs plotted in Fig. 10, such as Las 
Vegas, Albuquerque, Flagstaff and others. The plots show that a large 
MSA area is associated to a series of disconnected small counties, like 
it is seen, for instance, in the region near Las Vegas. This clustering of 
disconnected small cities inside a MSA results into an overestimation 
of the emissions associated with the Las Vegas MSA, for instance. 
The same pattern is verified for many small cities, specially in the 
mid- west of US, as seen in the other panels. For some large cities, like 
NY, the agglomeration captures similar shapes as in the occupied 
areas obtained with CCA, although it is also clearly seen that the area 
of the NY MSA contains many unoccupied regions. Therefore, the 
occupied area of a typical MSA is overestimated in comparison to the 
area that is actually populated as captured by the CCA, the bias is 
larger for small cities than larger ones. This endogeneity bias leads to 
an overestimation of the C0 2 emissions of the small cities as com- 
pared to large cities. Consequently, we find a smaller allometric 
exponent for MSA than CCA with an almost extensive relation: 

log(C0 2 ) = A M sA + ^ M s A log(POP), (6) 

with A MSA = 1.08 ± 0.38 and p MSA = 0.92 ± 0.07 (R 2 = 0.71, see 
Fig. 11). This result is consistent with previous studies of scaling 
emissions of MSA by Fragkias et al. 36 , who used MSAs and found a 
linear scaling between emissions and size of the cities, and also 
Rybski et al. 28 , who used administrative boundaries to define 256 
cities in 33 countries. Table II and III summarize the results of 
CCA and MSA cities. 

Thus, the measurement bias in the MSAs leads to smaller p found 
for MSA as compared with CCA, since low-density MSAs have rela- 
tively large areas. Hence, the CCA results, which are not subject to 
that endogeneity bias, should be considered the main source of 
information on emissions. They show a positive link between emis- 
sions and population size as well as the expected extensive behavior 
of the occupied land. This analysis calls the attention to use the 
proper definition of cities when the scaling behavior of small cities 
needs to be accurately represented. Indeed, this issue arises in the 
controversy regarding the distribution of city size for small cities 
since the distribution of administrative cities (such as US Places) 
are found broadly lognormal (that is, a power law in the tail that 
deviates into a log-Gaussian for small cities) 21,39 " 42 , while the distri- 
bution of geography-based agglomerations like CCA is found to be 
Zipf distributed along all cities (power-law for all cities) 13 " 18 ' 20 . 



SCIENTIFIC REPORTS | 4:4235 | DOI: 1 0.1 038/srep04235 



6 



(a) 



(c) 



(e) 



3 - 



(N 

o 

U 2 

o 



1 1 1 
A — 7 fid ■+■ J 00 


i 


_ 0 = 0.92 ± 0.59 ^ 

^^^^^ 








i i i i 


4 40 

INC < 10 " 

i i i 



(b) 



3.5 



log POP 



4.5 



A = -2.44 ±0.63 
P = 1.41 ± 0.18 




4 45 4 50 

10 < INC < 10 



log POP 



6 

o 

U 4 

o 



A 



■■ -2.26 ± 0.26 
■- 1.43 ± 0.07 




10 4 - 55 <INC<10 4 - 60 



(f) 



(g) 



log POP 



6 

CM 

o 

U 4 

o 



A = -1.69 ±0.26 
P = 1.31 ± 0.07 




10 465 <INC< 10 47 ° 



A = -2.04 ± 0.89 J 
P = 1.30 ± 0./6 




4 40 4 45 

10 < INC < 10 



3.5 



4.5 



5.5 



(d) 



log POP 



A = -2.37 ± 0.37 
P = 1.44 ±0.10 




log POP 



CO 



A = -1.94 ± 0.21 
P = 1.36 ± 0.06 



u 

bD 3 
O 




10 46 °<INC< 10 465 



(h) 



log POP 



6 

o 

U 4 

bD 
O 



A = -1.29 ± 0.23 
_ P = 1.23 ± 0.06 




INC > 10 



4.70 



log POP 



log POP 



Figure 8 | Total C0 2 emissions in metric tonnes of carbon per year versus POP of CCA clusters for different income's range as indicated. We found a 
superlinear relation between C0 2 and POP for all the cases except for the lowest income below $ 25, 119. The solid (black) line is the Nadaraya- Watson 
estimator, the dashed (black) lines are the lower and upper confidence interval, and the solid (red) line is the linear regression. The resulting 
exponent /?(INC) is plotted in Fig. 7. 



SCIENTIFIC REPORTS | 4:4235 | DOI: 1 0.1 038/srep04235 



7 



6 I ' 1 ' 1 ' 1 1 1 r 




J I I I I V' I i i i I I I i i 

1 2 3 4 5 6 7 8 

log POP 

Figure 9 | Scaling of the occupied land area versus population for MSAs 
and CCA clusters. Two problems are evident from this comparison. First, 
the range of population obtained by MSA is two decades smaller than that 
of CCA since CCA captures all city sizes while MSA is defined only for the 
top 274 cities. Second, the MSA violates the extensivity between land area 
and population while CCA does not. This is due to the fact that MSA 
agglomerates together many small cities into a single administrative 
boundary with a large area which can be largely unpopulated, as can be see 
in the examples of Fig. 10. This results in an overestimation of the size of 
the areas of small cities compared with large cities, resulting in the violation 
of extensivity shown in the figure. This endogenous bias is absent in the 
CCA definition. This bias in the small cities ultimately affects the 
allometric exponent yielding a Pmsa smaller than the one obtained using 
the CCAs. 



Discussion 

In general, we expect that when the scaling obtained by CCA is 
extensive, then any agglomeration of CCA such as MSA, should give 
rise to extensive scaling too. However, when there are intrinsic long- 
range spatial correlations in the data (like in non-extensive systems 
with P 7^ 1), agglomerating populated clusters (as done with MSA) 
may give different allometric exponents depending on the particular 
administrative boundary used to define cities. It is of interest to note 
that, beyond MSA 36 , there are other administrative boundaries used 
in the literature to define cities, like for instance US-Places studied 
in 39 " 41 . This measurement bias is a generic property of any non- 
extensive system, such as a physical system at a critical point. 
Thus, scaling laws obtained using administrative boundaries to 
define cities which cluster data in a somehow arbitrary manner 
may need to be taken with caution. 

In summary, we find that CCA urban clusters in the US have sub- 
optimal C0 2 emissions as measured by a superlinear allometric 
exponent ft > 1. The exponent /? decreases for cities with low and 
high income per capita in agreement with an EKC hypothesis 7 . From 
the point of view of allometry, larger cities may not represent an 
improvement of C0 2 emissions as compared with smaller cities. 

Methods 

Population dataset. The United States population dataset for the year 2000 is a part of 
the Global Rural-Urban Mapping Project (GRUMPvl). The GRUMPvl is available in 
shapefile format on the Latitude- Longitude projection (Fig. 12a) and it was developed 
by the International Earth Science Information Network (CIESIN) in collaboration 
with the International Food Policy Research Institute (IFPRI), the World Bank, and 
the Centro Internacional de Agricultura Tropical (CIAT) 22 (Fig. la). The GRUMPvl 
combines data from administrative units and urban areas by applying a mass- 
conserving algorithm named Global Rural Urban Mapping Programme (GRUMPe) 
that reallocates people into urban areas, within each administrative unit, while 
reflecting the United Nations (UN) national rural-urban percentage estimates as 
closely as possible 22 . The administrative units (more than 70, 000 units with 
population > 1, 000 inhabitants) are based on population census data and their 



administrative boundaries. The urban areas (more than 27, 500 areas with population 
> 5, 000 inhabitants) are based on night-time lights data from the National Oceanic 
and Atmospheric Administration (NOAA) and buffered settlement centroids (in the 
cases where night lights are not sufficiently bright). In order to provide a higher 
resolution gridded population data (30 arc-second, equivalent to a grid of 0.926 km X 
0.926 km at the Equator line), the GRUMPvl assumes that the population density of 
the administrative units are constant and the population of each site is proportional to 
the administrative unit areas located inside of that site. We exported the original data 
to the ASCII format on Lambert Conformal Conic projection (Fig. 12b), available to 
download at http://jamlab.org. Both projections parameters are defined as follow: 

Projection name: Latitude-Longitude (LL) 

Horizontal datum name: WGS84 

Ellipsoid name: WGS84 

Semi-major axis: 6378137 

Denominator of flattening ratio: 298.257224 

Projection name: Lambert Conformal Conic (LCC) 

Standard parallels: 33, 45 

Central meridian: -97 

Latitude of projection origin: 40 

False easting: 0 

False northing: 0 

Geographic coordinate system: NAD83 

Emissions dataset. The second dataset used in this study is the annual mean of the 
United States fossil fuel carbon dioxide emissions with the grid of 10 kmX 10 km for 
the year 2002. Full documentation is available at http://vulcan.project.asu.edu/pdf/ 
Vulcan.documentation.v2. 0.online.pdf. This dataset was compiled by the Vulcan 
Project (VP) and it is already available in binary format on the Lambert Conformal 
Conic projection defined above. The VP was developed by the School of Life Science at 
Arizona State University in collaboration with investigators at Colorado State 
University and Lawrence Berkeley National Laboratory 23 . The VP dataset is created 
from five primary datasets, constituting eight data types: The National Emissions 
Inventory (NEI) containing the Non-road data (county-level aggregation of mobile 
surface sources that do not travel on roadways such as boats, trains, ATVs, 
snowmobiles, etc), the Non-point data (county-level aggregation of non-geocoded 
sources), the Point data (non electricity-producing sources identified as a specific 
geocoded location) and the Airport data (geolocated sources associated with taxi, 
takeoff, and landing cycles associated with air travel); The Emissions Tracking 
System/Continuous Emissions Monitoring (ETS/CEM) containing the Electricity 
production data (geolocated sources associated with the production of electricity); 
The National Mobile Inventory Model (NMIM) containing the On-road data 
(county-level aggregation of mobile road-based sources such as automobiles, buses, 
and motorcycles); The Aero2k containing the Aircraft data (gridded sources 
associated with the airborne component of air travel), and finally, the Portland 
Cement containing the cement production data (geolocated sources associated with 
cement production). 

These data types supply the C0 2 emissions sectors: Aircraft, Cement, Commercial, 
Industrial, Non-road, On-road, Residential, and Electricity. In order to represent all 
the sectors in a 10 km X 10 km grid, the VP assumes that the C0 2 emissions of each 
site is given by the contributions of the geocoded and non-geocoded (via area- 
weighted proportions) sources located inside of that site. We exported the original 
data to the ASCII format, available to download at http://lev.ccny.cuny.edu/ 
~hmakse/soft_data (Fig. lb and Fig. 2). 

Income per capita dataset. We also use the US income dataset available in ASCII 
format by US Census Bureau 24 for the year 2000. This dataset provides the mean 
household income per capita for the 3, 092 US counties. For each county, we 
combined the income data and the administrative boundaries (Fig. lc) in order to 
relate them with the geolocated datasets. The US county boundaries are also available 
to download in ASCII format by the US Census Bureau 25 . However, we already joined 
these datasets and provided them to download at http://lev.ccny.cuny.edu/~hmakse/ 
soft_data. 

Superimposing the datasets. We superimposed the population and C0 2 datasets on 
the Lambert Conformal Conic projection in order to estimate the C0 2 emissions on a 
higher grid level (0.926 km X 0.926 km). We checked if each population site is inside 
of a C0 2 site. If so, we assigned the C0 2 value as proportional to its area (0.926 2 km 2 ), 
considering that the C0 2 density is constant in each C0 2 site. For the population and 
income datasets, we checked if each population site (actually, the center of mass) is 
inside of some US county boundary. If so, we assigned the income value for that site 
equal to the income value for the county. We performed this test taking into account 
that a horizontal line (in the polygon direction), starting in a point that is inside of a 
polygon, hits on it an odd number of times, while a point that is outside of the 
polygon, hits on it an even number of times. 

MSA. The definitions of Metropolitan Statistical Area (MSA), Primary Metropolitan 
Statistical Area (PMSA) and Consolidated Metropolitan Statistical Area (CMSA) are 
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Figure 10 | Examples of MSA and CMSA combining the datasets from Global Rural-Urban Mapping Project (GRUMPvl), Vulcan Project (VP) and 
US Census Bureau 22 ' 23 ' 37 : (a)-(c) MSA of Albuquerque (Albuquerque, NM); (d)-(f) MSA of Flagstaff (Flagstaff, AZ-UT); (g)-(i) CMSA of Los Angeles 
(Los Angeles-Riverside-Orange County, CA); (j)-(l) MSA of Reno (Reno, NV); and (m)-(o) MSA of Las Vegas (Las Vegas, NV-AZ). In the first 
column, we plot the population as given by the GRUMPvl dataset inside the administrative boundary of the MSA as provided by the US Census Bureau. 
The grey regions show the large unpopulated areas considered inside the MSA. The large MSA areas thus put together different populated clusters into one 
large administrative boundary. In the second column we plot the C0 2 emissions dataset inside the boundary of each MSA. The population and the C0 2 
emissions are plotted in logarithmic scale according to the color bar at the bottom of the plot. In the third column, we plot the CCA clusters inside the 
corresponding MSA. Different from the MSA, the CCA captures the contiguous occupied area of a city. 
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MSA / CMSA 



log POP 

Figure 1 1 | C0 2 emissions in metric tonnes/year versus POP using the 
MSA/CMSA definition of cities for the total C0 2 emissions. We found 
almost extensive relation between C0 2 and POP with the allometric scaling 
exponent ^ MS a = 0.92 ± 0.07 (R 2 = 0.71) The solid (black) line is the 
Nadaraya-Watson estimator, the dashed (black) lines are the lower and 
upper confidence interval, and the solid (red) line is the linear regression. 

provided by the US Census Bureau 37 . The MS As are geographic entities defined by 
some counties socioeconomically related with population larger than 50, 000. The 
PMSAs are analogous to MSAs, however they are defined by just one or two counties 
also socioeconomically related with population larger than 1, 000, 000. Finally, the 
CMSA are large metropolitan region defined by some PMSAs close to each other. In 
order to set a relation between the definition of MSA/CMSA cities and CCA cities, we 
show the 15 most populated MSA/CMSA cities and the largest CCA cities associated 
to them in Table I and II. The largest CCA city associated to a given MSA/CMSA is 
defined by the most populated CCA city whose center of mass is inside of that MSA/ 
CMSA boundary. All datasets are available to download from 37 , including the 
population and administrative boundaries of MSA/CMSA. Additionally, we make 
them available at http://lev.ccny.cuny.edu/~hmakse/soft_data. 



Table II | Population ranking of the top 15 CCA cities for D* = 
1000 inhabitants and i = 5 km. The total number of cities for 
these parameters is N = 2281 . The areas are given in km 2 , the 
incomes per capita are given in US$ and the CO2 emissions are 
given in metric tonnes/year 



CCA city 


Population 


Area 


Income 


C0 2 


Npw York 

1 N — VV 1 \J 1 lx 


14 203 323 


3,963 


54,21 9 


22 656 248 

Z— Z— J V-/ s-/ \J J Z— *~T \J 


nc Anno oc 


1 9 94R 9^9 


4,730 


44,935 


1 7 890 9S9 


Chicago 


5,989,209 


271 6 


50^454 


13,180,388 


San Francisco 


4,135,709 


1,604 


66,141 


3,628,217 


Miami 


4,041,31 1 


2,029 


38,430 


4,851,895 


Washington 


3,981,576 


2,077 


61,052 


6,689,123 


Philadelphia 


3,147,779 


1,408 


48,568 


6,350,1 15 


Dallas 


2,987,071 


1,797 


49,563 


4,225,519 


Houston 


2,670,156 


1,520 


43,497 


5,104,1 14 


Detroit 


2,534,128 


1,578 


47,915 


6,038,681 


Phoenix 


2,221,393 


1,295 


46,914 


2,616,81 1 


Boston 


1,838,516 


760 


55 f 055 


3,161,289 


San Diego 


1,620,953 


744 


48,104 


1,881,183 


Denver 


1,539,876 


958 


53,282 


3,294,302 


Seattle 


1,176,431 


752 


54,636 


1,872,446 



We compute the 95% (a = 0.05) confidence interval (CI) by the so-called a/2 quantile 
function over 500 random bootstrapping samples with replacement. 

For our case, the distribution is the set of values {X b Y { } = {log(POPi), log(C0 2i )}, 
where i is from 1 to the number of CCA cities N. Furthermore, we calculate the 
exponents by the ordinary least square (OLS) method 47 . Let us to consider the terms, 

S*=|>„ (9) 



N 



Nadaraya-Watson method. In order to calculate the allometric scaling exponents, we 
performed well-known statistic methods 31 . For one data distribution {X b Y t ], we apply 
the Nadaraya-Watson method 43 ' 44 to construct the kernel smoother function, 



m h (x)-- 



T^^Knix-XtjYi 



£ti^(*-X z 

where N is the number of points and K h (x — X f ) is a Gaussian kernel of the form. 

\(x-X t ) 



(7) 



Kh(x— Xi) = exp 



2h 2 



(8) 



where the h is the bandwidth estimated by least squares cross-validation method 45 ' 4. 



N 

S X y=Y, X i Y i> 



',■=[*,-- Und 



(11) 



(12) 



(13) 



Table III Population ranking of the top 1 5 MSA/ CMSA cities and the associated CCA (|) for D* 
areas are given in km 2 and the CO2 emissions are given in metric tonnes/year 

MSA/CMSA city Population Area C0 2 Population 1 


= 1000 

Area + 


inhabitants and £ = 5 km. The 

co\ 


New York 


21,199,865 


28,752 


49,533,908 


14,203,323 


3,963 


22,656,248 


Los Angeles 


16,373,645 


88,092 


36,896,108 


12,248,239 


4,730 


17,890,252 


Chicago 


9,157,540 


18,012 


32,759,994 


5,989,209 


2,716 


13,180,388 


Washington 


7,608,070 


25,304 


26,035,616 


3,981,576 


2,077 


6,689,123 


San Francisco 


7,039,362 


1 9,462 


15,969,389 


4,135,709 


207 


379,91 1 


Philadelphia 


6,188,463 


15,788 


18,462,316 


3,147,779 


1,408 


6,350,1 15 


Boston 


5,819,100 


15,086 


1 8,684,998 


1,838,516 


760 


3,161,289 


Detroit 


5,456,428 


17,269 


16,959,726 


2,534,128 


1,578 


6,038,681 


Dallas 


5,221,801 


24,575 


15,802,243 


2,987,071 


1,797 


4,225,519 


Houston 


4,669,571 


21,105 


30,483,362 


2,670,156 


1,520 


5,104,1 14 


Atlanta 


4,1 12,198 


1 6,064 


22,936,928 


1,021,846 


697 


2,204,638 


Miami 


3,876,380 


8,748 


6,824,965 


4,041,31 1 


2,029 


4,851,895 


Seattle 


3,554,760 


19,834 


1 0,489,945 


1,176,431 


752 


1 ,872,446 


Phoenix 


3,251,876 


37,800 


7,594,759 


2,221,393 


1,295 


2,616,81 1 


Minneapolis 


2,968,806 


16,485 


23,292,798 


1,053,751 


674 


5,438,483 
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Figure 12 | The map projections from Global Rural-Urban Mapping Project (GRUMPvl) 22 . (a) Latitude-Longitude projection and (b) Lambert 
Conformal Conic projection of the population map of continental US. 



S tt =i>, 2 . (14) 

i=l 

The regression exponents (A and /? in the equation Y = A + (IX) are given by, 



lJ2tiYi and A=^ 



Sy-/]S X 



(15) 



If the errors are normally and independently distributed, the standard error of each 
exponent is given by 32 , 

s.e.(A) = t a/2 ^_ 2 ^- 2 and s.e.tf) = f a/2>N _ 2 (16) 



where t oc/2 ^- 2 is the Student-t distribution with a/2 = 0.025 of CI and IV — 2 degrees 
of freedom, and the variances o A and op are given by, 



(7 A = 



and ^ = ^i- (17) 



Finally, we show the value of the regression exponents as, 

A±s.e.{A) and P±s.e.(0). (18) 



/^-squared. The R 2 is the coefficient of determination or squared and is calculated 
as following: 



R z = l- 



Er = i[y»-(A+W] 2 

The R 2 by emission sector and the average /? are in Table I. 
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